home *** CD-ROM | disk | FTP | other *** search
- Path: mail2news.demon.co.uk!genesis.demon.co.uk
- From: Lawrence Kirby <fred@genesis.demon.co.uk>
- Newsgroups: comp.lang.c
- Subject: Re: Need code to remove non-adjacent duplicate lines
- Date: Sat, 30 Mar 96 20:49:01 GMT
- Organization: none
- Message-ID: <828218941snz@genesis.demon.co.uk>
- References: <1996Mar27.113154.14694@schbbs.mot.com>
- Reply-To: fred@genesis.demon.co.uk
- X-NNTP-Posting-Host: genesis.demon.co.uk
- X-Newsreader: Demon Internet Simple News v1.27
- X-Mail2News-Path: genesis.demon.co.uk
-
- In article <1996Mar27.113154.14694@schbbs.mot.com> ghelm "george_helm" writes:
-
- >Does anyone have C code that will remove *non-adjacent* duplicate
- >lines from an ascii file ? I need to retain the original file
- >format so I can't use simple stuff like sort -u or uniq in UNIX.
- >Help on this is greatly appreciated.
-
- How you approach this depends on whether the file is small enough to be
- stored in memory. If it is then you build a lookup datastructure keyed
- on the contents of the line. If a new line matches an entry in the
- datastructure you ignore it, otherwise add it and output it. The following
- demonstrates the algorithm:
-
- awk '
- {
- if (!a[$0]) {
- a[$0] = 1
- print
- }
- }'
-
- I leave it as an exercise for the reader to translate this to C! :-)
-
- --
- -----------------------------------------
- Lawrence Kirby | fred@genesis.demon.co.uk
- Wilts, England | 70734.126@compuserve.com
- -----------------------------------------
-